4 research outputs found

    Who Learns Better Bayesian Network Structures: Accuracy and Speed of Structure Learning Algorithms

    Get PDF
    Three classes of algorithms to learn the structure of Bayesian networks from data are common in the literature: constraint-based algorithms, which use conditional independence tests to learn the dependence structure of the data; score-based algorithms, which use goodness-of-fit scores as objective functions to maximise; and hybrid algorithms that combine both approaches. Constraint-based and score-based algorithms have been shown to learn the same structures when conditional independence and goodness of fit are both assessed using entropy and the topological ordering of the network is known (Cowell, 2001). In this paper, we investigate how these three classes of algorithms perform outside the assumptions above in terms of speed and accuracy of network reconstruction for both discrete and Gaussian Bayesian networks. We approach this question by recognising that structure learning is defined by the combination of a statistical criterion and an algorithm that determines how the criterion is applied to the data. Removing the confounding effect of different choices for the statistical criterion, we find using both simulated and real-world complex data that constraint-based algorithms are often less accurate than score-based algorithms, but are seldom faster (even at large sample sizes); and that hybrid algorithms are neither faster nor more accurate than constraint-based algorithms. This suggests that commonly held beliefs on structure learning in the literature are strongly influenced by the choice of particular statistical criteria rather than just by the properties of the algorithms themselves.Comment: 27 pages, 8 figure

    Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms

    Get PDF
    Three classes of algorithms to learn the structure of Bayesian networks from data are common in the literature: constraint-based algorithms, which use conditional independence tests to learn the dependence structure of the data; score-based algorithms, which use goodness-of-fit scores as objective functions to maximise; and hybrid algorithms that combine both approaches. Constraint-based and score-based algorithms have been shown to learn the same structures when conditional independence and goodness of fit are both assessed using entropy and the topological ordering of the network is known [1]. In this paper, we investigate how these three classes of algorithms perform outside the assumptions above in terms of speed and accuracy of network reconstruction for both discrete and Gaussian Bayesian networks. We approach this question by recognising that structure learning is defined by the combination of a statistical criterion and an algorithm that determines how the criterion is applied to the data. Removing the confounding effect of different choices for the statistical criterion, we find using both simulated and real-world complex data that constraint-based algorithms are often less accurate than score-based algorithms, but are seldom faster (even at large sample sizes); and that hybrid algorithms are neither faster nor more accurate than constraint-based algorithms. This suggests that commonly held beliefs on structure learning in the literature are strongly influenced by the choice of particular statistical criteria rather than just by the properties of the algorithms themselves.CEG and JMG were supported by the project MULTI-SDM (CGL2015-66583-R, MINECO/FEDER)

    Probabilistic Network Modeling in Complex Systems

    No full text
    RESUMEN: Los sistemas complejos aparecen en un amplio abanico de disciplinas y están compuestos por muchos elementos que interactúan y en los que surgen fenómenos colectivos que no pueden deducirse de las propiedades de sus elementos constitutivos. Durante las últimas décadas, los principales métodos y herramientas para estudiar y caracterizar los sistemas complejos se han desarrollado en el ámbito de las redes complejas en física estadística. Paralelamente, las redes probabilísticas se han desarrollado como métodos genéricos de modelización y predicción en el ámbito de la ciencia de datos. Esta tesis explora las sinergias metodológicas entre las redes complejas y las redes probabilísticas, centrándose en el desarrollo de enfoques adecuados para modelar y predecir el comportamiento de sistemas complejos a partir de datos. Se centra en dos sistemas seleccionados: un sistema de regulación génica y el sistema climático de la Tierra.ABSTRACT: Complex systems appear in a wide range of disciplines, they are composed of many interacting elements in which collective phenomena emerge that cannot be inferred from the properties of their constituent elements. During the last decades, the main methods and tools to study and characterise complex systems have been developed in the area of complex networks in statistical physics. In parallel, probabilistic networks have been developed as generic modelling and prediction data science methods. This thesis explores the methodological synergies between complex networks and probabilistic networks, focusing on the development of suitable approaches to model and predict the behaviour of complex systems from data. The focus is on two selected systems: A gene regulatory system and Earth’s climate system
    corecore